Search results for "Genomic data"
showing 10 items of 23 documents
Penalized regression and clustering in high-dimensional data
The main goal of this Thesis is to describe numerous statistical techniques that deal with high-dimensional genomic data. The Thesis begins with a review of the literature on penalized regression models, with particular attention to least absolute shrinkage and selection operator (LASSO) or L1-penalty methods. L1 logistic/multinomial regression models are used for variable selection and discriminant analysis with a binary/categorical response variable. The Thesis discusses and compares several methods that are commonly utilized in genetics, and introduces new strategies to select markers according to their informative content and to discriminate clusters by offering reduced panels for popul…
Whole-Genome Analyses
2014
Abstract Average nucleotide identity (ANI) was proposed almost 10 years ago as a means to compare genetic relatedness among prokaryotic strains. It was found that values around 95% corresponded to the 70% DNA–DNA hybridization cut-off value that is widely used to delineate archaeal and bacterial species. ANI calculations are one of the many aspects and approaches that can be derived from comparative genomic data and used for taxonomic purposes. Here, an overview about the impact and current usage of ANI values is given together with details of the existing user-friendly package tool, the biology-oriented software package JSpecies, which can be used to generate two types of ANI calculations …
Glomeromycotina: what is a species and why should we care?
2018
International audience; A workshop at the recent International Conference on Mycorrhiza was focused on species recognition in Glomeromycotina and parts of their basic biology that define species. The workshop was motivated by the paradigm-shifting evidence derived from genomic data for sex and for the lack of heterokaryosis, and by published exchanges in Science that were based on different species concepts and have led to differing views of dispersal and endemism in these fungi. Although a lively discussion ensued, there was general agreement that species recognition in the group is in need of more attention, and that many basic assumptions about the biology of these important fungi includ…
Functional comparison of bacteria from the human gut and closely related non-gut bacteria reveals the importance of conjugation and a paucity of moti…
2016
International audience; The human GI tract is a complex and still poorly understood environment, inhabited by one of the densest microbial communities on earth. The gut microbiota is shaped by millennia of evolution to co-exist with the host in commensal or symbiotic relationships. Members of the gut microbiota perform specific molecular functions important in the human gut environment. This can be illustrated by the presence of a highly expanded repertoire of proteins involved in carbohydrate metabolism, in phase with the large diversity of polysaccharides originating from the diet or from the host itself that can be encountered in this environment. In order to identify other bacterial fun…
Reconfigurable Accelerator for the Word-Matching Stage of BLASTN
2013
BLAST is one of the most popular sequence analysis tools used by molecular biologists. It is designed to efficiently find similar regions between two sequences that have biological significance. However, because the size of genomic databases is growing rapidly, the computation time of BLAST, when performing a complete genomic database search, is continuously increasing. Thus, there is a clear need to accelerate this process. In this paper, we present a new approach for genomic sequence database scanning utilizing reconfigurable field programmable gate array (FPGA)-based hardware. In order to derive an efficient structure for BLASTN, we propose a reconfigurable architecture to accelerate the…
Ten millennia of hepatitis B virus evolution
2021
Hepatitis B virus (HBV) has been infecting humans for millennia and remains a global health problem, but its past diversity and dispersal routes are largely unknown. We generated HBV genomic data from 137 Eurasians and Native Americans dated between ~10,500 and ~400 years ago. We date the most recent common ancestor of all HBV lineages to between ~20,000 and 12,000 years ago, with the virus present in European and South American hunter-gatherers during the early Holocene. After the European Neolithic transition, Mesolithic HBV strains were replaced by a lineage likely disseminated by early farmers that prevailed throughout western Eurasia for ~4000 years, declining around the end of the 2nd…
Reactome graph database: Efficient access to complex pathway data
2018
Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its qu…
According to the CPLL proteome sheriffs, not all aperitifs are created equal!
2014
Combinatorial peptide ligand libraries (CPLLs) have been adopted for investigating the proteome of a popular aperitif in Northern Italy, called "Amaro Branzi", stated to be an infusion of a secret herbal mixture, of which some ingredients are declared on the label, namely Angelica officinalis, Gentiana lutea and orange peel, sweetened by a final addition of honey. In order to assess the genuineness of this commercial liqueur, we have prepared extracts of the three vegetable ingredients, assessed their proteomes, and compared them to the one found in the aperitif. The amaro's proteome was identified via prior capture with CPLLs at two different pH values (2.2 and 4.8). Via mass spectrometry …
Detection of batch effects in liquid chromatography-mass spectrometry metabolomic data using guided principal component analysis.
2014
Metabolomics based on liquid chromatography-mass spectrometry (LC-MS) is a powerful tool for studying dynamic responses of biological systems to different physiological or pathological conditions. Differences in the instrumental response within and between batches introduce unwanted and uncontrolled data variation that should be removed to extract useful information. This work exploits a recently developed method for the identification of batch effects in high throughput genomic data based on the calculation of a delta statistic through principal component analysis (PCA) and guided PCA. Its applicability to LC-MS metabolomic data was tested on two real examples. The first example involved t…